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ABSTRACT 

We re-derive the made-to-measure method of ISver fc Tremamel (|1996fl for modelling 
stellar systems and individual galaxies, and demonstrate how extensions to the made- 
to-measure method may be implemented and used. We illustrate the enhanced made- 
to-measure method by determining the mass-to-light ratio of a galaxy modelled as 
a Plummer sphere. From the standard galactic observables of surface brightness and 
line-of-sight velocity dispersion together with the Gauss-Hermite coefficient of the 
line-of-sight velocity distribution, we successf ully recover the true mass-to-light ratio 
of our toy galaxy. Using kinematic data from iKlevna et aLl (|2002[ ). we then estimate 
the mass-to-light ratio of the dwarf spheroidal galaxy Draco achieving a V-band value 
of 539 ± 136 M Q /L Q . We describe the main aspects of creating a made-to-measure 
galaxy model and show how the key modelling parameters may be determined. 

Key words: galaxies: kinematics and dynamics - galaxies: individual: Draco - meth- 
ods: N-body simulations - methods: numerical 



1 INTRODUCTION 

Within the field of galactic and stellar dynamics, it has be- 
come common practice to model kinematic observations of a 
galaxy in order to interpret the observations and to under- 
stand better the underlying dynamical structures within the 
galaxy. N-body modellin g is one of the techniques employed. 
ISver fc Tremainel (l996) categorised methods for creating N- 
body systems into 3 groups namely distribution function 
based, moment or Jeans equation based and orbit based, 
placing their made-to-measure (M2M) method in a new 
particle based group. The M2M method is, however, sim- 
ilar to the methods in th e orbit based group notably the 
method of ISchwarzschildl (|l979l ). Schwarzschild's method 
since its inception has undergone both si gnificant develop- 
ment (for ex ample, [C hanam e et all 2008) and explo i tation 
(for example. ICappellari et al.ll2007n . lChaname et al.l (120081 ) 
extended the method to hand l e disc rete stellar kinematic 
data sets, and ICappellari et all (|2007l ) used Schwarzschild's 
method to model photometric and kinematic observations 
of elliptical galaxies from the SAURON survey. By compar- 
ison, Syer and Tremaine's M2M method remained largely 
un utilised until the bulg e and disk model of the Milky Way 
in iBissantz et all (|2004h . More recently, the M2M method 
was enhance d by the inclusion of a capabil i ty to model kine- 
matic da ta Jjourdeuil fc Emselleml |2007| . Ide Lorenzi et al] 
l2007l andlde~ Lorenzi et al.ll2008l ). and was applied to model 
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the elliptical galaxies NGC 3377 and NGC4697. More re- 
cently still, iDehnenl i|2009h implemented an alternative 
weight adaption mechanism within the M2M method. 

In this paper we remain true to the outline in 
ISver fc Tremainel (Il996l ) but reframe the M2M method (sec- 
tion[2J slightly to improve the theoretical basis for the weight 
evolution equation, and introduce 2 further constraints - the 
first on the sum of the particle weights and the second on the 
isotropy of the velocity dispersion. We show how the param- 
eters necessary to tune the model for a given situation may 
be determined (section [3). We use the method to determine 
the mass-to- light ratio of a toy galaxy (section [SJ and then 
to estimate the mass-to-light ratio of the dwarf spheroidal 
galaxy Draco (section [6}. We aim to provide sufficient detail 
and advice to enable others to produce their own implemen- 
tation of the M2M method. Our implementation is described 
in section [3] 



2 THE M2M METHOD 
2.1 Outline 

In brief, the M2M method is concerned with modelling 
stellar systems and individual galaxies as a system of test 
particles orbiting in a gravitational potential. Weights are 
associated with the particles and are evolved over time 
(many orbital periods) such that, by using these weights, 
observational measurements of a real galaxy are repro- 
duced. The method uses these observational measurements 
as constraints on the model. Whilst it is tempting to 
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think of the particles as representing stars, the particles 
are mo re accurately described as p hase space density ele- 
ments (jrlernauist fc Ostrikerl 11992 ) whose motion is inte- 
grated along the characteristic curves of the collisionless 
Boltzmann equation. The gravitational potential may be 
prespecified or determined self- consistently, and may con- 
tain a dark matter component. ISchwarzschi ld ( 1979) uses 
a similar approach but does not modify the weights during 
the main modelling run obtaining them instead at the end 
via linear programming. 

Reviewed in the next sec tion, section | 2.2I a re the the- 
oretical approaches tak en in ISver fc Tremaina (|l996l ) and 
Ide Lorenzi et al] l|2007f ). We then recast the method from 
a maximum likelihood starting point paying particular at- 
tention to the origin of the derivative term in the weight 
evolu tion equation. Th e part icle weight convergence analy- 
sis m ISver fc Tremaind (l996) is not revisited in this paper. 



2.2 Theory 

The galactic observables used with a M2M model are mo- 
ments of the distribution function and have the general form 



Y, 



Kj (r, v)f(r, v)d 3 rd 3 v, 



(1) 



where subscript j indicates an instance of the observable, 
Kj is the kernel for the observation Yj and f(r,v) is the 
phase space distribution function. In this context, typical 
observables are surface brightness and surface brightness 
times the luminosity weighted line-of-sight velocity disper- 
sion squared. For a model of N particles, this integral form 
is translated into 



N 

\ " 



(2) 



lection function such that only the particles which have a di- 
rect effect on the observation j/j are included in the sum. The 
goal of the M2M method is to evolve the particle weights, 
Wi(t), such that the time averaged model observations yj 
match the actual observations Yj. 

The weights, representing (in our case) the luminosity 
of individual particles implemented as a fraction of the total 
luminosity of the galaxy being modelled, are evolved using 



dt 



Wi (t) 



Kj{ri{t),Vi{t)) 



A,(t), 



(3) 



where index j runs from 1 to J, A,-(t) = Ij , Zj is 

arbitrary, and the kernels Kj are implemented by binning 
the model's particle data, e is small, positive and constant. 
Calculations of the Aj from the model are exponentially 
smoothed to reduce the impact of particle counting effects 
using 

d_ a 

dt 



Aj(t) = a(Aj(t) - Aj(t)), 



(4) 



where a is small, positive and co nstant. Aj(t) is t h en re - 
placed with Aj(t) in equation [3] Ide Lorenzi et al] (|2007r ) 
take the equivalent approach of smoothing yj. 

The weight evolution equation may be extended in 
2 ways - firstly by the introduction of a profit function 



jSver fc Tremainel [l996) to constrain the overall weight 
evolution and smooth observable reproduction (regularisa- 
tion), and secondly by the inclusion of observational errors 
|de Lorenzi et al]|2007f ). The equation then becomes 



du>i _ 
~dt ~ 

where 



dF 
' duii ' 



F 



Q 1 2 



(5) 



(6) 



and is to be maximised, e, a an d u, in equation s [3] |4] and 
[6] are discussed in more detail in ISver fc Tremainel (1996). 
Here, the equivalent discussion is delayed until section 14.11 
Note that to arrive at equation|3]from equation|6]the kernels 
must not depend on the particle weights. 5*, the profit func- 
tion, varyingly known as the relative entropy or Kullback- 
Leibler divergence from some initial value (prior) of the par- 
ticle weights, is given by 



^ Wi In 



where the wij are the priors, x is calculated as 



(7) 



(8) 



with Aj being the exponentially smoothed form of either 
a relative difference between the model and target observa- 
tions 



yj - Yj 

Y, 



(9) 



or, if o(Yj) represents the measurement error in Yj, in a 
more usual \ 2 form 

v-j - Yj 



Ai = 
J tr(Y 3 



(10) 



If multiple classes (K) of constraining observables are 
used, it is computationally convenient to replace F by 



F = » S - \ X] 



(11) 



The number of observations within each class may be differ- 
ent and may be subject to a different binning schemes. 

Our derivation of the weight evolution equation is very 
similar to that above and is based on constructing a likeli- 
hood function giving the likelihood of the model reproducing 
the measured galactic observations and then maximising it 
(in log form) subject to a time derivative constraint on the 
relative entropy of the weights. 

Redefining F now as 



1 2 . 1 dS 
and substituting for S gives 



1 2 



ld_ 

Idt 



Wi(t) In 



i(t) 



(12) 



(13) 
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Wi(t) 
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Maximising F with r espect to the weights gives the 
ISver fc Trema inc (1996) weight evolution equation 



dt 



Wi(t) = —ewi(t) 



E 



JO(ri(t),w 4 (t)) 



Aj(t) 



(14) 



The M2M metho d is similar to a maximum entropy 
method (for example, iRichstone fc Tremainel Il988l ) but in 
this case the prior reflects the need for the particle weights 
to be constant over time. 

Additional observational or modelling constraints may 
be included simply by modifying F. Whether the weights 
will converge or whether the observations will be reproduced 
requires either experi mentation or a con v ergen ce analysis to 
be performed as per ISver fc Tremainel (|l996l ). In general, 
constraints which are expressed as squared 'distance mea- 
sures' appear to perform satisfactorily. A simple extension 
to the method which meets these caveats is to amend the 



2^LM 



K 
k 



(15) 



where the \k are small positive parameters and may be used 
to rescale xl, or to express the relative priority of observable 
class fc within the M2M m odel. Note that, while n ot express- 
ing x 2 as in equation 1151 Ide Lorenzi et al] l|2007l ) employ a 
scaling factor, not dissimilar in function to the At, in their 
'force of change' equation (their e"). 

Similarly, given that a weight is a fraction of the total 
galaxy luminosity L, that is 



(16) 



then it is not unreasonable to require that the method does 
not alter the total luminosity and an appropriate constraint 
to include is the following (minimisation) term in F 




Wi — 1 



(17) 



Re-normalisation of the weights was considered and rejected 
as it would destro y the s moot hing history built up in th e Aj . 
ISver fc Tremainel (jl996r i and Ide Lorenzi et al.l (|2007h con- 



Dehnen (2009) has an alter- 



tain no such similar constraint, 
native scheme for modifying F to achieve weight conserva- 
tion. 

Discrete observables (for example, measurements of 
the line-of-sight velocities of individual stars) are incorpo- 
rated into the method as follows. The probability, Pd.j = 
PD,j(x±j, v^j), of the model reproducing a discrete line-of- 
sight velocity measurement is found by convolving the line- 
of-sight velocity distribution (losvd) with a Gaussian incor- 
porating the observational errors aj. 



VD.j = 



losvd(a;xj, «||) exp — 



2o) 



dv h \Y$) 



where the || and _L subscripts indicate parallel and perpen- 
dicular to the line of sight. Equation 1141 gains an additional 
constraint term 



D 

Ld — Ad ln(pn,. 



(19) 



where Ad is a small positive parameter. Note that this is 
not the only way of incorporating discrete observables. For 
example, the PD,j could have been included directly in the 
log likelihood function used to create \ 2 - 

From expressing the line-of-sight velocity distribution 
in terms of the distribution function 



losvd (x± 



J dx\\d 2 v±f(x, v) 
J dx^d s vf(x, v) 



(20) 



PD,j is calculated from the model as 



wi exp 



PD,j = 



("II 



27TO",' 



(21) 



Wi 



where the selection function Sij takes the value 1 if particle i 
contributes to observation j and is otherwise. The contri- 
bution to the weight evolution equation is found by differ- 
entiating the log likelihood function Ld with respect to the 
particle weights and independently expone ntially smoothing 
the re sulting numerator and denominator. Ide Lorenzi et al.l 
(2008) arrive at an equivalent expression. Proper motion 
data could be incorporated into the method in a similar 
fashion but this is not explored further here. 

If we require a model with an isotropic velocity disper- 
sion, this may be achieved by defining a model observable 



Ef Lwi&ij 



(22) 



where v r and v t are the radial and tangential velocities, and 
including 



o Ais °E 



2 



(23) 



as a minimisation term in F. These expressions come di- 
rectly from using luminosity weighted velocity dispersions 
in calculating the f3 anisotropy parameter 



1 



v t_ 

2v? 



(24) 



and then setting /3 = 0. The denominator ^ N LwiSij ei- 
ther should be exponentially smoothed as per equation [4] 
to reduce particle counting effects, or alternatively may be 
replaced by an observationally derived value by recognising 
that Lj = LwiSij is the luminosity of radial shell j. 
There is a third option which is to create a composite con- 
straint of luminosity times the original constraint. The Lj 
approach is used for the isotropy constraint in the rest of this 
paper. These 3 options apply not just to the isotropy con- 
straint but to all constraints where a sum of weights appears 
in the denominator. To be precise, the isotropy constraint is 
a constraint on (3 and it will not enforce strict isotropy with 
the same dispersion in each of the 3 velocity components 
v r , vq and v^. Extending the constraint to the case where 
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(3 = /3(r) is st raightforward and has been implemented in 
iDehnenl (|2009l ). 



2.3 Model observables and kernels 

In this section we list the model observables and kernels 
from which we use an appropriate subset in the M2M mod- 
els described in this paper. We show the selection func- 
tion separately from the actual kernel and also abbrevi- 
ate Kj(ri(t),Vi(t)) to Kji. Within a given class of observ- 
ables, one particle contributes to only one observable. No 
attempts have been made to smear particles to mimic the 
effects of an obse r vation al po int-spread function, say , as in 
ISver fc Tremaind l| 19961 ) and Ide Lorenzi et al.l (|2007T ). The 
total luminosity of the galaxy being modelled is L. 



(i) Luminosity density 



N 



v, 



V, 



where Vj is the volume of the associated model bin. 



(ii) Surface brightness 
Lvji 



Ih 



N 



(25) 
(26) 

(27) 
(28) 



where Aj is the area of the associated model bin. 

(iii) Surface brightness times luminosity-weighted line-of- 
sight second velocity moment 



Vj 



Kji = 



i 

Lvl 



A, 



A, 



(29) 



(30) 



(iv) Luminosity-weighted line-of-sight second velocity 
moment 



Vj 



Kji 



N 2 

i 

Lvl 



(31) 



(32) 



where Ij is the measured surface brightness. This form of 



the observable is an alternative to that in item (iii 



(v) Surface luminosity times line-of-sight velocity distri- 
bution Gauss-Hermite coefficient (n) 



= v^7 1 L 5 l] w l H n (v 1 - lolm ,i) exp(-«n 0rmi ,/2X33) 



K-i 



= ^27 1 LH n (v noInli i) exp( — «n rm,i/2) 



(34) 



where v n orm,i = (v\\i — v bcst ) /a bcat . 7, v bcat and cr best are the 
line strength, the mean line-of-sight velocity and dispersion 
from the best Gaussian fit to the observed line-of-sight veloc- 
ity distribution data. To determine the Hermite polynomial 
values, we use the recurrence relationship 



H P (x) 



xH p -i{x) 



-Hp-,(x), p>2 (35) 



with H (x) = and Hi(x) = y/2x. 

Modelling the line-of-sight velocity distribution with a 
truncated Gau ss-He rmite polynomial series is dis cussed in 
iGerhardl [| 19931 ) and Ivan der Marel fc Franxl l| 19931 ). 

We have not at this time implemented any schemes 
for deprojecting surface brightness to give a lumino sity 
density distribut i on as in Ijourdeuil fc Emselleml (|2007l ) or 
Ide Lorenzi et al . (2008) where a multi-Gaussian expansion 
ijEmsellem et al.lll994l ) is used to construct the distribution 
for their elliptical galaxies. Deprojection is not an integral 
part of M2M method. If we require a luminosity density, we 
assume that deprojection could be performed and just use 
a theoretical luminosity density, modified as in section 13.21 



3 IMPLEMENTATION 

We describe in this section some of the key practical issues 
to be addressed in designing and implementing a software 
system to meet the theoretical design in section [5] 



3.1 Process and data flows 

We have decomposed the process and data flows into 3 main 
phases. The first phase, preparation, covers creating the par- 
ticle initial conditions, and either creating from theoretical 
functions the observational constraints to be used, or ma- 
nipulating observations of real galaxies into a form suitable 
for modelling. The second phase is the actual running of 
the M2M model, and the third phase is concerned with the 
analysis of the output from the modelling run. The analysis 
phase is split into 3 components namely particle weight con- 
vergence, reproduction of the observational constraints and 
particle kinematics. 

The execution phase follows quite naturally from the 
equations in section [2] and is shown in Figure [T] Given 
the number of particles used in modelling runs, in our case 

10 4 - 2 x 10 6 , we have parallelised our implementation 
so that multiple computer processors may be used to re- 
duce the overall execution elapsed times. We adopt a simple 
strategy whereby 1 processor controls the modelling run and 
is responsible for the collation and smoothing of the model 
produced observations and calculating the weight evolution 
constraint terms. The other processors are responsible for 
orbiting their subset of the particles, calculating their contri- 
bution to the model observations and updating their particle 
weights. At the end of a run, the control processor collates 
all the particle data for subsequent analysis. We find that 
this parallelisation strategy, without change, works accept- 
ably well across a range of hardware platforms from single 
dual processor machines, clusters of PCs and on high per- 
formance computing systems with fast inter-processor data 
connections. As shown in Table [1] doubling the number of 
processors approximately halves the computer elapsed time. 
Note that our implementation does not as yet handle self- 
consistent potentials created by the particles themselves. 

The weight convergence algorithm described in section 
I3.5l we have implemented in computer memory with the slave 
processors being responsible for monitoring the convergence 
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Table 1. MPI scaling 



Processors 


Elapsed time (m) 


8 


18.3 


16 


8.0 


32 


4.0 


(51 


2.0 



The elapsed time figures are from a 5 X 10 5 particle model with 
3 observational constraints run for 100 dynamical time units in a 
high performance computing environment. 



Input modelling 
parameters 

Load observational 
constraints 

I 



Load particle initial 
conditions 



Advance particles 
around their orbits 



Construct model 
observations 
using particle weights 




Figure 1. Execution phase flowchart. The parallelised processes 
are contained within the dashed box. 



of their own particles. At the end of a modelling run, the 
controlling processor collates the weight convergence data 
for subsequent analysis. Clearly, alternative schemes are pos- 
sible, for example using filestore. 



In modelling galaxies, a system of large spatial extent, 
possibly infinite if theoretical functions are used, is being 
modelled in a computer system. Inevitably, model bound- 
ary effects occur primarily as a result of model observable 
values falling off faster with radius than would be the case 
with theoretical functions, and the limited number of par- 
ticles close to the boundary. We solve this issue in one of 
two ways, either by the simple expedient of oversizing the 
models and only analysing the central portions, or by trun- 
cating the distribution function and constructing a tailored 
observable function. The distribution function is truncated 
by energy, limiting the energy to the range between the po- 
tential energy at the origin and at the model boundary, and 
setting the distri bution function to zero for energies outside 
of this range. As [Kashlinskvl (| 19881 ) points out, this is not 
the only way of truncating the distribution function. For ex- 
ample, truncation by radius, resulting in an increase in the 
number of (bound) circular orbits, may be appropriate but 
we have conducted no experiments using this approach. 

Assuming a distribution function which is a function of 
energy only, we construct the luminosity density function 
via a particle realisation of the truncated distribution func- 
tion over the radial extent of the M2M model. To ensure 
the correct radial distribution of particles, we determine the 
density of particles with a given energy, Ve,{t), by integrating 
the distribution function over velocity space to give 



(r) = 4nf(E)y / 2 (E — 4>(r)). 



(37) 



We obtain the radial position for a particle by sampling uni- 
formly randomly from the 'fraction within radius' function, 
Ne(< r) given by 



N E (< r) = 



16tt 2 J r ^2 (E - <f>{r))r 2 dr 



(38) 



where g(E) is the density of states function. 

Figure [2] compares a luminosity density function cre- 
ated in this way with the usual analytic density function for 
a Plummer sphere (radii are in units of the core radius). As 
can be seen, the two functions match in the inner part of 
the model but with the constructed function going to zero 
as required at the boundary of the model. For data cre- 
ation purposes, the constructed luminosity density function 
is used in tabular form and integrated numerically to cre- 
ate other luminosity related functions, for example surface 
brightness or luminosity weighted velocity dispersion. 

Luminosity density is an example of an observable 
which decreases with radius. Modelling observables which 
increase with radius, for example a rising velocity dispersion, 
the effects are more extreme. As can be seen from Figure [3] 
the inner part of the model where there is a good match 
between the theoretical function and the particle realisation 
is reduced to as 25% of the radial extent of the model. 



3.2 Finite modelling 

All the models we use in this paper are either spherical or 
spheroidal. We set a maximum radius for a model and ar- 
range that particles do not 'escape' from the model by set- 
ting an energy constraint on their initial velocities (v). That 
is 



3.3 Particle initial conditions 

All particles are given the same initial weight and prior equal 
to 1/JV, where TV is the number of particles in the model. 

For the particles' initial spatial and velocity coordinates 
we use one of three schemes, 



v 2 ^ 2 [ei(boundary) — (^(initial position)] . (36) 



(i) spatial coordinates allocated such that the particle dis- 



6 R. J. Long and S. Mao 



le+OO 




0123456789 10 
r 

Figure 2. Comparison for a Plummcr sphere of the theoretical 
luminosity density function and the equivalent function from a 
particle realisation. 



le+00 



tional data from a real galaxy is being used, and the third 
scheme, when theoretical models are being used. All the 
schemes clearly need at least a potential to be specified in 
order to be used. We have however deliberately separated 
the creation of the initial conditions from the main mod- 
elling software to increase flexibility - we may choose to run 
the model with a different potential for example. 



3.4 Observational constraints 

Clearly any observations of a real galaxy will need to go 
through a process of manipulation and conversion in order 
to get them into a form where they can be used with a 
M2M model. For theoretical galaxy models, for the purposes 
of developing the M2M method, we create constraint data 
by sampling from a Gaussian distribution with mean the 
function value, and standard deviation calculated from a 
pre-specified relative error. We constrain data values so pro- 
duced by limiting them to be within n-sigma of the function 
values (n is typically 2 or 3). 




Figure 3. Comparison for a Wilkinson ct al . (2002) model, with 
the potential power law parameter a = —0.5, of the rising theo- 
retical velocity dispersion and the equivalent from a particle re- 
alisation. Only the inner 25% of the model is usable. 



tribution matches the luminosity distribution, and the ve- 
locity coordinates uniformly randomly distributed, 

(ii) spatial coordinates allocated such that the particle 
distribution matches the luminosity distribution, and the 
velocity coordinates sampled randomly from a Gaussian dis- 
tribution created from the velocity dispersion function (from 
solving the Jeans equations), 

(iii) by assuming that the distribution function is a func- 
tion of relative energy only, spatial and velocity cordinates 
obtained from from distributing the particles uniformly ran- 
domly in energy. 

In the third scheme, we sample uniformly, randomly 
from the integrated diff erential energy distribution (see 
iBinnev fc Tremaind l2008h to obtain a particle's energy, use 
the energy to determine the maximum radius it implies, al- 
locate the particle's spatial position within that radius using 
j-'E(r) fequation l37|l . and finally use the energy difference be- 
tween between the particle's energy and its spatial position 
to allocate the velocity components. 

The first two schemes are appropriate when observa- 



3.5 Weight convergence & observable 
reproduction 

In assessing whether or not a modelling run has been suc- 
cessful, we require, amongst other criteria, a high degree of 
weight convergence and consistent observable reproduction 
over a number of orbits. In practice, we choose some repre- 
sentative time period T c which will cover, for example, both 
the long and short period particle orbits. For weight conver- 
gence, we consider a particle's weight to have converged if 
its maximum relative deviation from its mean weight over 
time period T c is less than some predetermined model-wide 
tolerance, that is if 



-mcani 



'mean.? 



< tolerance. 



(39) 



We take as our unit of time the local dynamical time at the 
half mass radius of the model 



1 time unit : 



3tt 
16G,5' 



(40) 



where p is the mean density inside the half mass radius. For 
a typical model run, we set T c to be « 10% of the model 
duration and use a tolerance of 5 per cent or less. 

Good model stability with respect to the particle 
weights is also required. Stability in this context means the 
degree to which a model's outputs vary if the particles are 
subsequently orbited with weight evolution turned off. This 
is particularly important if the particles are to be used in 
some further modelling process. Stability is affected by the 
number of particles with converged weights and the total 
weight associated with particles with unconverged weights, 
and is illustrated in section [4] 

We consider a M2M model to have reproduced the ac- 
tual observations if the smoothed model observations match 
the actual observations to within the measurement errors on 
the actual observations, that is if 



(41) 
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jjj may be calculated directly from Aj using the smoothed 
form of equation [10] Note that observable non-reproduction 
does not necessarily imply that the method has failed - the 
smoothness of the constraining data needs to be taken into 
account as does the degree of smoothing employed in the 
model. 

Given our success criteria for a model run and the mech- 
anisms we use to measure whether they have been achieved, 
we do not deploy a 'p hase mixing' phase as described in 
Ide Lorenzi et alj (^OOTt ). 



for binning, and also that for numerical integration and in- 
terpolation, utilises the GNU Scientific LibrarjQ. 



4 CREATING A M2M MODEL 

In this section, we consider various aspects of creating a 
M2M model and cover tuning its key parameters, the impact 
of different particle initial conditions, how many particles to 
use, and modelling incomplete data sets. Section 14.21 deals 
specifically with the impact of the relative entropy derivative 
constraint. 



3.6 Numerical methods 

For orbit integration, we use the standard interleaved sec- 
ond order leapfrog method (drift, kick, drift) with either a 
fixed or adaptive time step. We either specify the time step 
directly or allow the model to determine it from the local 
dynamical time at the origin. For adaptive time stepping we 
use a 3 level model, progressively decreasing a particle's time 
step by a factor of 2 as it nears the origin. Over the dura- 
tion of a typical modelling run of 250 dynamical time units, 
we achieve an average maximum relative energy precision 
AE(t)/ E(0) w 10 -4 which represents a reasonable level of 
energy conservation and is satisfactory for M2M purposes. 
We have tried (and discarded pending further inves tigation) 
the 'd imensionless time' orbit integration noted in iDehnenl 
(2009). While it does increase the numbers of orbits of par- 
ticles with long orbital periods, we find that weight con- 
vergence is reduced by « 2% and that there is no change 
regarding the type of particles with unconverged weights at 
the end of a modelling run - it remains the highest energy 
particles. 

For integration of the weight evolution equation and 
exponential smoothing we use Euler's method. For example, 
the exponential smoothing equation (equation 3} becomes 



A,(t + 1) = Aj(t) + aSt [A 3 (t) - Aj(t)] 



(42) 



Rather than updating the particle weights every time step, 
we take the weight evolution time step as an integer multiple 
(^ 5 for production runs) of the orbit integration time step. 
By reducing the inter-processor data traffic, a saving in com- 
puter elapsed time is achieved with no loss in effectiveness 
of the M2M method. For example, changing the multiple 
from 1 to 5 results in « 50% saving in elapsed time with 
minimal changes in both weight convergence and the model 
Xlm value. 

For three dimensional radially dependant observables, 
we use a radial binning scheme, and a polar scheme, with 
azimuthal binning if required, for two dimensional observ- 
ables. The bin sizes may be either regular or irregular. Also, 
individual bins are only used if required by the distribution 
of the observational data - that is, we allow for gaps in the 
data. For the regular schemes, we use uniform or logarithmic 
bin sizes, or, to assist in the resolution of any central den- 
sit y peak, pseudo- logarithmic radial bin sizes as described 
in lSellwoodl |2003l ). 



= (r 



+ i: 



i/B 



1. 



i = 1, 



,B 



(43) 



where r max is the maximum radius, divided into B bins, 
and Ti is the radius of the i th bin boundary. The software 



4.1 Parameter tuning 

The parameters are the weight convergence rate e, a for ex- 
ponential smoothing, fi governing regularisation, the observ- 
able constraint A^'s from equation [15] and A sum and Ai so for 
the sum of weights constraint and isotropic dispersion con- 
straint. The parameters are treated as tunable with their 
values being determined prior to the start of a modelling 
run. Setting the parameters should be thought of as a pro- 
cess - we have identified no simple mechanism which will 
determine all the parameters required in one trial modelling 
run. The key to the process is to establish the initial values 
for the parameters. Having done this, it is straightforward to 
perform a series of modelling runs, increasing or decreasing 
the parameter values, to achieve a particular desired posi- 
tion. 

We first explain how we assess the impact of the rel- 
ative entropy (regularisation) term in the weight evolution 
equation. To recap, where constructed observable data are 
being used, we create the data using Gaussian sampling 
from the various functions of the associated mathematical 
model. The combination of relative error and sigma cut-off 
determine the spread of data points around the functions. 
In practice, what is needed from a M2M model, no matter 
whether the data is constructed or real, is not that the ob- 
servable data points are individually reproduced but that a 
smooth curve approximating the data points and reflecting 
any key features in the data is generated. Given the method 
of data construction, the smooth curve approximating the 
data points should be the function the data were generated 
from. To give a measure of how effective a M2M model is in 
re-creating the underlying functions, the following relative 
sum of squares is calculated for every class s of observable 
constraint. 



j 

£ 

3 



x, 



(44) 



where Xj is the theoretical value for data point j and Xj is 
the equivalent smoothed value produced by the M2M model 



x 3 = Y 3 + Ajcr(Yj] 



(45) 



Where observations of a real galaxy are being used, the un- 
derlying functions will not be known. The solution then is to 
create a trial data set approximating the real data set, with 
known underlying functions, and to use that trial data to 
determine the degree of regularisation required (the value of 

1 http:/ /www. gnu.org/software/gsl/ 
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n) when the real data is modelled. Note also that regularisa- 
tion is model-wide and is not specific to any one constraint. 
Our preference is to start with a low value of /i initially 
« 10 -3 ). We find that as /j, is increased it becomes nec- 
essary to reduce e to maintain good weight convergence. 

Given the above, our completion criteria for a tuning 
exercise are that a high degree of weight convergence has 
been achieved, that the amount of unconverged weight is 
low, that the observable C s values described above are stable 
(orbiting the particles with weight evolution turned off does 
not cause significant change), and that all other constraints 
have been met. 

For the observable constraints, the weight evolution 
equation contains terms of the form 



(46) 



In the following, we quantify the impact of the term numer- 
ically and relate it to the value of the relevant A^ parameter 
and to the relative strengths of the observable constraint 
terms in the weight evolution equation. 

Replacing Aj by its unsmoothed form and replacing 
a{Y 3 ) by 



(47) 



(where ay is the fractional error) gives terms of the form 



(48) 



Assuming the right hand fraction is numerically comparable 
for all observables then it is the left hand fraction which 
dictates the constraint contribution to the weight evolution 
equation. 

For example, using the expressions for the kernels in 
section 12.31 and ignoring the total luminosity of the galaxy 
being modelled, the left hand fractions (evolution factors) 
for the surface brightness and dispersion constraints are 



T 3B,j = 



tVD.j 



(49) 
(50) 



where viu is some line of sight velocity. For simplicity, v\\j 
is taken as the maximum velocity occurring in the model. 
When comparing model runs at different mass-to-light ra- 
tios, given that vjf, scales according to the ratio T, the 
balance of the terms in the weight evolution equation will 
change across the runs. This can be resolved by multiplying 
Tvd.j by T" 1 . 

From experience, we have found suitable start values for 
the observable constraint parameters by making the product 
of the typical evolution factor and parameter w 0(1), that 
is, the parameter is used to neutralise the evolution factor. 
For example, for surface brightness 



TsbjAsb « O(l) 



(51) 



Clearly, the evolution factors will have a range of values per- 
haps spanning several orders of magnitude. For the models 
in this paper, taking the value at 2 effective radii gives a 
suitable compromise. 

Using a similar analysis to that above for the sum 



of weights and isotropic dispersion constraints, we set the 
product of the typical evolution factor and parameter to 
be w 10 -2 . Setting the values initially to be lower than for 
the observable constraints means that observable constraints 
have a stronger influence in the weight evolution equation. 

For the exponential smoothing parameter a, we perform 
a series of modelling runs with no regularisation and with a 
being varied from 10 1 (no smoothing, a = St -1 - see equa- 
tion to 5 x 10 -3 . We find that, as the amount of expo- 
nential smoothing increases, weight convergence increases, 
the unconverged weight decreases and the magnitude of the 
Xlm gradient increases. The statistical fluctuations in Xlm 
are greatly reduced once 10 -2 < a < 10 _1 where the exact 
value depends on the number of particles being used (5 x 10 4 
at the lo wer end to 10 6 at t he up per). It therefore turns out 
that the ISver fc Tremainel (Il996l ) value of a = 5.24 x 1CT 2 
is in fact a reasonable default value to use over quite a wide 
range in particle numbers and bin configurations. 

To summarise, 

(i) Parameter determination must be viewed as a process. 

(ii) For the main observable constraints, setting the prod- 
uct of the typical evolution factor and parameter to be of 
O(l) gives a means of determining the initial values of the 
parameters. For other A constraints, setting the product to 
~ 10 -2 gives a usable start position. 

(iii) For the other parameters, e = 0.025, a = 0.05 and 
H — 0.001 are reasonable starting values. 

(iv) The larger the observable errors and the greater the 
spread of data points, the more likely it is that smoothing / 
regularisation from the relative entropy term will be needed 
and a higher value of [t, required. Quite how much regulari- 
sation is required (or desirable) is application specific. With 
increased regularisation it is highly likely that observable re- 
production will reduce. Based on the experiments we have 
conducted, a higher value of y,, changing the influence of 
terms in the weight evolution equation, requires a reduction 
in the value of e to achieve acceptable results (for example, 
high weight convergence). 

(v) Regardless of the value of e, the rate of particle weight 
convergence should be monitored. Usable results may be ob- 
tainable from shorter modelling runs. 

As a final comment, particularly when comparing pa- 
rameter values between papers, it is the e parameter prod- 
uct (for example, e/i in equation 1 14f) which is important. 
Parameter values may be rescaled as required provided e is 
rescaled correspondingly. Also, determination of the param- 
eter values may appear onerous but it should be remembered 
that it is only necessary to incl ude the constraints and p a- 
rameters that are required. The lSver fc Tremainel l| 19961 ) fi 
adjustment process has not been used but the process or 
its equivalent may be applicable to other parameters and 
requires further investigation. 



4.2 Relative entropy and the derivative constraint 

We investigate the behaviour of the relative entropy regu- 
larisation term (S) and the relative entropy derivative con- 
straint (dS/dt) as the /i parameter is increased from 10 -3 to 
10 3 . We perform 2 sets of runs, the first having no further 
constraints and the second using the total particle weight 
constraint. 5 x 10 4 particles are used and A sum = 7 x 10 2 in 
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Table 2. Rcgularisation 





Unconstrained Weight 


Constrained Weight 




S 




S 




1.0 icr 3 


3.52 10~ 3 


1.00 


1.43 10~ 6 


1.00 


1.0 IO" 2 


3.36 10~ 2 


0.97 


1.43 IO" 5 


1.00 


1.0 io- 1 


2.21 IO" 1 


0.74 


1.43 10~ 4 


1.00 


1.0 10° 


3.68 IO" 1 


0.38 


1.43 10~ 3 


1.00 


1.0 io 1 


3.68 IO" 1 


0.37 


1.40 IO" 2 


0.99 


1.0 10 2 


3.68 IO" 1 


0.37 


1.16 IO" 1 


0.88 


1.0 10 3 


3.68 IO" 1 


0.37 


3.41 IO" 1 


0.52 




Increasing regularisation (fi) with and without the total weight 
constraint, and no other constraints. 



the second set of runs. The theoretical maximum in 5* occurs 
when Wi = rrii/e at which time, given that m% = 1/N where 
N is the number of particles, S ma x = 1/e and ^w, = 1/e. 
For higher value of /i, these values for Smax and ^ Wi can 
be seen in Table [2] in the 'unconstrained weight' columns. 
However, there is a conflict between the value of V} tm at 
5 max and the requirement for Wj = 1 to conserve the lu- 
minosity of the galaxy being modelled. Imposing the total 
weight constraint causes the requirement to be met but at 
a lower value of S'max (the 'constrained weight' columns of 
Table [2}. Where 7^ 1, this can be resolved by increas- 

ing the value of A sum . For both sets of runs, dS/dt does tend 
to zero with increasing time, with the timescale to do so 
reducing as n is increased. Within the maximisation of F, 
equation 1121 dS/dt behaves as the constraint dS/dt — and 
not as a function to be maximised. 

We now extend the model to include other constraints 
(as in Table |4| and the results are recorded in Table [3] For 
completeness, we also include a run with fi = 0, that is, with 
no regularisation. As before, the time by which dS/dt w 
and S ~ constant reduces at higher n values (illustrated in 
Figure^}. Note that the initial value of S is zero and that the 
M2M model may produce a negative final value of S. This 
is interpreted as the maximum value of S that the method 
is able to generate given all the other constraints. 



4.3 Particle initial conditions 

We examine the effect of the three different particle initial 
spatial and velocity conditions using a Plummer model with 
2 x 10 5 particles run for 250 time units. The full parameter 
set is recorded in Tableland the results are shown in Table 
[S] All three schemes perform satisfactorily and, as might be 
expected, it is the distribution function energy-based scheme 
(section 1 3 . 3 P which yields the best results. From Figure [5] it 
is clear that, for the random velocity scheme, it is the high 
energy particles whose weights evolve furthest from their 
start values. The evolution over time of the model Xlm val- 
ues (Figure O shows differences between the schemes. In 
particular, the energy-based Xlm shows no initial 'overshoot' 
as the particles start orbiting. 



-6e-03 1 1 ' 1 1 ' r "w-^fw^ 1 

100 200 300 400 500 600 700 800 
Time 

8e-02 I 1 1 1 1 1 

p. 

6e-02 



4e-02 - A 
to 2e-02 
0e+00 




-2e-02 - 

-4e-02 I 1 1 1 1 

50 100 150 200 250 

Time 

Figure 4. Relative entropy time evolution for fi = 1.0 (top panel) 
and fi = 10.0 (bottom panel). A relative entropy constant value 
is achieved after 750 time units for the lower value of ji and 
after pa 100 units for the higher value. 

Table 4. Particle initial conditions modelling parameters 



Parameter 


Model Value 


Overall model size 


10 units 


Number of particles 


2 X 10 5 


Model duration 


250 units 


Orbit integration time step 


0.02 units 


Weight convergence monitoring 


25 units 


Weight convergence tolerance 


5% 


e 


2.5 x 10~ 3 


a 


5.0 x IO" 2 


H 


1.0 


Surface brightness, AgB 


5.0 x KT 4 


Second velocity moment, Ay2 


5.0 x 10~ 3 


Gauss-Hermite /14, Ah4 


2.0 x IO" 3 


Sum of weights, A SU m 


7.0 x 10 2 


Isotropic dispersion, A; so 


1.0 x 10" 1 



Spatial distances are given in units of the projected half light 
radius and times or durations in units of the half mass dynamical 
time. 32 bins are used for the surface brightness constraint and 
24 for the velocity related constraints. 
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Table 3. Impact of increasing rcgularisation 



M 


—F 


S 




^SB 


X V 2 


Particles 


Uconv 




Csb 


C*V2 














Converged 


Weight 




















(%) 


(%) 








0.0 


5.48 10~ 2 


-9.03 10~ 3 


1.10 lO" 1 


2.27 10 1 


1.88 10 1 


99.53 


0.62 


1.00 


3.01 10" 2 


1.63 10" 1 


1.0 10~ 3 


5.48 10" 2 


-9.02 10~ 3 


1.10 lO" 1 


2.27 10 1 


1.88 10 1 


99.53 


0.62 


1.00 


3.01 10~ 2 


1.63 10- 1 


1.0 10~ 2 


5.49 10" 2 


-8.95 10" 3 


1.10 lO" 1 


2.28 10 1 


1.88 10 1 


99.53 


0.63 


1.00 


3.00 10~ 2 


1.63 10" 1 


1.0 lO" 1 


5.61 10~ 2 


-8.26 10" 3 


1.10 lO" 1 


2.30 10 1 


1.90 10 1 


99.54 


0.62 


1.00 


2.93 10" 2 


1.57 lO" 1 


1.0 10° 


6.27 10~ 2 


-3.28 10~ 3 


1.17 10" 1 


2.51 10 1 


2.04 10 1 


99.44 


0.72 


1.00 


2.43 10~ 2 


1.13 lO" 1 


1.0 10 1 


5.00 10~ 3 


1.34 10~ 2 


1.40 lO" 1 


3.48 10 1 


2.44 10 1 


98.23 


1.96 


0.99 


1.23 10~ 2 


2.63 10~ 2 


1.0 10 2 


-6.07 10° 


1.51 10" 1 


2.64 10" 1 


2.32 10 2 


2.93 10 1 


96.64 


3.53 


0.88 


3.99 10" 1 


3.02 10" 1 


1.0 10 3 


-2.58 10 2 


3.41 10" 1 


2.31 10° 


3.13 10 3 


1.44 10 2 


99.03 


1.01 


0.52 


7.29 10° 


5.47 10° 



Increasing regularisation (fi) with the total weight constraint applied and other constraints as in Table [4] Csb and Cv2 are defined as 
per equation 1441 At higher values of fj,, the weight evolution equation becomes 'unbalanced'. S increases as do the constraint x 2 values. 



Table 5. Comparison of different particle initial conditions 



Run 


—F 




XsB 


Xv2 


Particles 


Uconv 


C'sB 


Cv2 












Conv (%) 


Weight (%) 






Energy 


6.60 10" 2 


1.23 10" 1 


2.20 10 1 


2.20 10 1 


98.25 


1.96 


2.54 10~ 2 


7.92 10~ 2 


Gaussian 


8.25 10~ 2 


1.26 10 -1 


2.47 10 1 


2.23 10 1 


97.04 


1.96 


2.87 10~ 2 


1.11 lO" 1 


Random 


2.09 lO" 1 


1.47 10" 1 


5.35 10 1 


2.30 10 1 


94.86 


2.05 


1.43 10" 1 


1.64 lO" 1 



Random indicates that the spatial distribution matches the luminosity density with random velocity components while Gaussian has 
the same spatial distribution as Random but velocity components calculated by Gaussian sampling using the velocity dispersion. 
Energy indicates that the particle energies have been calculated utilising the distribution function. 



4.4 Number of particles 

We compare the effect of running a M2M model with dif- 
ferent numbers of particles, from 10 4 to 10 6 , and show the 
results in Table [6] We use energy-based particle initial con- 
ditions and the same parameter settings as in section 14.31 
For the constraints used, the M2M model performs well for 
all numbers of particles. As might be expected, increasing 
the number of particles increases F, reduces the model Xlm> 
increases particle weight convergence and reduces the total 
weight associated with particles with unconverged weights. 

Once an acceptable level of behaviour has been achieved 
from a M2M model, there is little to be gained by just in- 
creasing the number of particles without altering some other 
aspect of the model (for example, improving the spatial cov- 
erage of the observational constraints). Running the model 
with a low number of particles is attractive because of the 
reduced computer run times - the 10 4 particle run in Table 
[6] took ~ 2 minutes on a 2.8 GHz 'dual core' workstation. 
However, the weight stability of the application (running the 
model with weight evolution turned off) must be considered. 
The stability of a 10 4 particle run is worse (more variation in 
the model outputs over time) than that of a 10 5 or a 5 x 10 5 
particle run. 

Changing the particle initial conditions to the Gaussian 
velocity scheme shows only a minor degradation in model 
behaviour - weight convergence, for example, reduces by less 
than 0.5%. 



5 APPLICATION - MASS-TO-LIGHT 
DETERMINATION 

5.1 Overview 

We illustrate a practical application of the M2M method 
by using it to determine t he mass-to-light ratio of a simple 
spherical Plummer model (|Plummerl Il91 ll ) . Assuming that 
mass follows light and that the mass-to-light ratio is con- 
stant, we create a data set, comprising surface brightness, 
line-of-sight velocity dispersion and /14 Gauss-Hermite co- 
efficient values, with a known mass-to-light ratio (5 in this 
case). We run our M2M implementation with different mass- 
to-light ratios and expect that the data mass-to-light ratio 
will be indicated by a minimum in the values of — F and the 
model x 2 values. 



5.2 Model and data preparation 

We use constructed functions for luminosity density v{R, xn) 
and velocity dispersion a(R,xn) as described in section [3T2T 
In the following, we take the total luminosity as 1, the grav- 
itational constant G = 1, T as the mass-to-light ratio, R as 
the projected radius and r as the spherical radius. Spatial 
distances are given in units of the projected half light radius 
(= 1 for our model). The key Plummer model expressions 
we require are 



(i) Surface brightness 



I(R) = / da;||i/(7?,a:||). 



(52) 
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Table 6. Impact of increasing the number of particles 



Number 
Particles 


— F 








*SB 


X V 2 


Particles 
Conv (%) 


Uconv 
Weight (%) 


CsB 




Cv2 




1 10 4 


7.20 10- 


2 


1.29 10" 


l 


3.01 10 1 


2.15 10 1 


95.93 


4.50 


2.36 10- 


-2 


1.1810" 


i 


2 10 4 


6.59 10" 


2 


1.21 10" 


l 


2.59 10 1 


2.07 10 1 


97.58 


2.67 


2.85 10- 


-2 


9.59 10" 


2 


5 10 4 


6.31 10" 


-2 


1.18 10- 


l 


2.52 10 1 


2.03 10 1 


98.76 


1.41 


2.51 10" 


-2 


1.20 10- 


1 


1 10 5 


6.26 10" 


-2 


1.17 10- 


i 


2.55 10 1 


2.02 10 1 


99.20 


0.96 


2.22 10- 


-2 


1.15 10" 


1 


2 10 5 


6.27 10" 


-2 


1.17 10" 


i 


2.51 10 1 


2.04 10 1 


99.44 


0.72 


2.43 10" 


-2 


1.13 10" 


1 


5 10 5 


6.24 10- 


-2 


1.17 10" 


i 


2.52 10 1 


2.04 10 1 


99.54 


0.58 


2.39 10- 


-2 


1.10 io- 


1 


1 10 6 


6.23 10" 


-2 


1.17 10" 


i 


2.57 10 1 


2.03 10 1 


99.59 


0.53 


2.28 10- 


-2 


1.09 10" 


1 



Increasing the number of particles increases F, reduces the model Xlm' mcreases particle weight convergence and reduces the total 

weight associated with particles with unconverged weights. 
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Figure 5. End of run weight evolution comparison for the energy- 
based (top panel) and random velocity (bottom panel) schemes 
for creating particle initial conditions. The weights for the high 
energy particles in the random velocity scheme evolve furthest 
from their start values. 



(ii) Luminosity weighted line-of-sight velocity dispersion 
J dx\\v{R,x\\)o- 2 (R,x\\) 



^ I(R) 

(iii) Distribution function 

fQE\) x \E\ 7 ' 2 . 

(iv) Potential 



(53) 



(54) 



Figure 6. Time evolution of Xlm ^ or ener 5 , 2/-t , ased (top 
panel) and Gaussian velocity (bottom panel) schemes for cre- 
ating particle initial conditions. 



(r) 



T 



(r 2 + 1) 



1/2' 



(55) 



The observational constraints are surface brightness, 
surface brightness times line-of-sight velocity dispersion 
squared and surface luminosity times the hi Gauss-Hermite 
coefficient. Surface brightness is limited to a radial extent of 
8 units and the velocity constraints to 5. We bin a ll observ- 
ables radially, with pseudo-logarithmic bin sizes l|Sellwoodl 
2003), utilising 32 bins for surface brightness and 24 bins for 
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Table 7. Mass-to-light modelling parameters 



0.14 



Parameter 



Model Value 



Data mass-to-light ratio 


5 


Overall model size 


10 units 


Number of particles 


5 X 10 4 


Model duration 


250 units 


Weight convergence monitoring 


25 units 


Weight convergence tolerance 


5% 


€ 


2.5 x 10" 3 


a 


5.0 x 10" 2 




1.0 




5.0 x KT 4 


<Wd 


2.5 x 10~ 3 /T 


Ah4 


2.0 x 10" 3 


^sum 


10 3 


Mso 


5.0 x lO" 1 


Line of sight axis 


x axis 



Spatial distances are given in units of the projected half light 
radius and times or durations in units of the half mass dynamical 
time. 

the velocity constraints. For the error terms, a(Yj), we use 
a relative error of 5 per cent for surface brightness and 10% 
for line-of-sight velocity dispersion. For hi, we use an ab- 
solute error of 0.015 which is consistent wi th the published 
SAURON rms error ijde Zeeuw et al.ll2002l ). We create the 
surface brightness and line-of-sight velocity dispersion data 
values as described in section 13.41 using a 2-sigma cut-off. 
For hi, we just take the theoretical values (times the bin 
luminosity). In addition to the observational constraints , 
we also impose the total weight and the isotropic velocity 
dispersion constraints. 

For the purposes of this mass-to-light illustration, the 
particle initial conditions are energy based as described in 
section [331 and are created using the mass-to-light ratio of 
the model run, not the ratio with which the data was cre- 
ated. The particle initial weights and priors are set as 1/N 
where N is the number of particles. 



5.3 Results 

We execute the M2M model, using the parameters in Table 
[JJ for 9 different mass-to-light ratios and plot the resulting 
model —F, —S and Xlm values against mass-to-light ratio. 
As can be seen from Figure [7] these model values have a 
minimum at a mass-to-light ratio of ~ 5. We fit smooth 
curves to — F, —S and Xlm using cubic spline interpolation 
and determine that the minimum values occur at mass-to- 
light ratios of 4.91, 4.71 and 4.97 respectively. By removing 
the Xlm factors (the in equation I15|) and rescaling the 
X 2 curve such that the x 2 minimum value is equal to the 
number of degrees of freedom (78 in this case), we establish 
the la error bounds and give the model determined value 
of the mass-to-light ratio of the data set as 4.88 ± 0.21. To 
complete the x 2 analysis, we show the individual observable 
X 



plots in Figure [8] 



The total weight constraint is met (Table [8]) and so is 
the velocity dispersion isotropy constraint except at T = 2.5. 
Weight convergence is high, peaking close to the true mass- 
to-light ratio, and, no less important, the weight associated 
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Figure 7. End of run —F, —S and Xlm Pitted against mass- 
to-light ratio T. The minimum values are at T = 4.91, T = 4.71 
and T = 4.97 respectively. 



with particles with unconverged weights is low at < 2% of 
the total particle weight. 

By examining the particle weight distribution in ve- 
locity space, we investigate how the M2M method behaves 
given a mass-to-light value which does not match the obser- 
vational data set. For our Plummer model with the x-axis 
orientated to the line of sight, plotting the weight distribu- 
tion contours perpendicular to the line of sight (in the v z —v y 
plane) we would expect to see a circular pattern as in Fig- 
ure [9] and also in the other two planes. The T = 5 v x — v y 
plot (Figure [TTJ)) is as expected. However, the T = 2.5 plot 
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Table 8. Weight convergence and dispersion isotropy 
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Figure 8. End of run X§b> ^vd anc ' ^h4 plotted against mass-to- 
light ratio T. The minimum values occur at T = 4.59, T = 4.90 
and T = 5.52 respectively. 
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Weight convergence peaks close to the true mass-to-light ratio. 
The total weight constraint is met for all runs and the velocity 
dispersion is isotropic except for T = 2.5. 
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Figure 9. Particle weight distribution in velocity space perpen- 
dicular to the line of sight for T = 5. The highest valued contours 
arc in the centre with the lowest on the outside. 



persion constraint compared with the observable data. At 
lower mass-to- light ratios, calculating dL/dE from the end- 
of-run particle table shows that it is increased at low energies 
and reduced at high energies by comparison with its initial 
values. The converse is true for higher mass-to-light ratios 
with dL/dE being reduced at low energies and increased at 
high. 



is stretched in the v x direction whilst the T = 10 plot is 
compressed. Given the maximum velocity for T = 2.5 is less 
than that of the data set (T = 5), it is reasonable to expect 
that the M2M model selectively increases the weights of par- 
ticles to try and reproduce the data set line-of-sight velocity 
dispersion. Similarly for T = 10, with a greater maximum 
velocity, the weights of particles are selectively reduced. The 
end result is the distorted particle weight distributions. 

In Figure QT] we display plots for different mass-to-light 
ratios showing the model luminosity per unit energy dL/dE 
compared with the theoretical function and the velocity dis- 



5.4 Different constraints and initial conditions 

Clearly there are other combinations of constraining observ- 
ables and particle initial conditions we could have used. We 
now examine the effect of replacing surface brightness (run 
9 in Table [9} by luminosity density as a constraint (run 12), 
using an alternative velocity dispersion constraint (run 14), 
changing the particle initial conditions from energy based to 
Gaussian velocity (run 15) and random velocity (run 16), 
and finally increasing the number of particles from 5 x 10 4 
to 2 x 10 (run 10). For the alternative velocity dispersion 
constraint, we remove the surface brightness multiplier so 
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Figure 11. Reproduction of dL/dE and velocity dispersion. The rows, from the top, are for T = 2.5 and T = 5. The solid circles are 
the model produced values. In the left hand panels, the dotted line is the theoretical function. In the right hand panels, the solid line 
indicates the observable constraint data without error terms. The dispersion constraint is not met for T = 2.5. The M2M model is unable 
to increase the particle weights sufficiently to reproduce the constraint. The dL/dE plots show that is the high relative energy particles 
which are most obviously affected with the lower energy particle weights being adjusted to compensate. The converse arguments apply 
for mass-to-light ratios higher than the true ratio. 



Table 9. Different constraints and initial conditions 



Run 




X 2 




9 


4.97 


4.88 ±0.21 


4.90 ±0.27 


12 


4.96 


4.88 ±0.18 


4.91 ±0.26 


14 


5.61 


5.37 ±0.28 


5.38 ±0.33 


15 


4.99 


4.92 ±0.22 


4.79 ±0.28 


16 


4.80 


4.65 ±0.27 


4.52 ±0.27 


10 


5.00 


4.89 ±0.22 


4.90 ±0.27 



Run 9 is the main mass-to-light run described in section[53] Run 
10 using 4 times as many particles as run 9 shows little difference 
in the results. The remaining runs are as per section [5. II 



5.5 Summary 

To conclude this section, we have demonstrated the M2M 
method in action determining the mass-to-light ratio of a 
galaxy modelled by a theoretical Plummer sphere model. 
All the constraining observables are simple combinations of 
observational measurements regularly taken of actual galax- 
ies, that is surface brightness, line-of-sight velocity disper- 
sion and the /14 Gauss-Hermite coefficient and their asso- 
ciated errors. For our constructed data generated with a 
mass-to-light ratio of 5, we are able to use the M2M method 
in a variety of ways to estimate that ratio. Using Xvdi our 
best estimate is 4.91 ± 0.26 with a spread of results from 
4.52 ±0.27 to 5.38 ±0.33. 



that the constraint is just line-of-sight velocity dispersion 
squared. 

We find that the largest variations from the true mass- 
to-light ratio come from using the alternative dispersion con- 
straint and the random velocity initial conditions. For the 
other runs, the true mass-to-light is within 1-sigma of the 
model estimates. Note that increasing the number of parti- 
cles makes very little difference. 



6 DRACO 

6.1 Introduction 

Draco is a dwarf spheroidal satellite galaxy of the local group 
located some 75 kpc from the Sun and is interesting, both 
cosmologically and astrophysically, in that there is no cur- 
rently accepted explanation for its stellar kinematics with- 
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Figure 10. Particle weight distribution in velocity space parallel 
to the line of sight for T = 2.5 (top), T = 5 (target, centre) and 
T = 10 (bottom). As described in section \5. 31 the contours show 
stretching and compression in the v x direction as the M2M model 
tries to match the supplied velocity dispersion constraint for T 
values less than and greater than the target value. The reason for 
the double peak in the T = 2.5 plot is not known. 



out invoking dark matter. IWilkinson et~aH (|2002T l describe 
a math ematical mode l for m odelling dwarf spheroidal galax- 
ies and iKlev na et alj (|2002l ) apply the model to Draco. In 
this section, we us e the M 2M method with the Draco data 
from iKlevna et aD J20021) plus the i sotrop ic velocity distri- 
bution model from I Wilkinson et alJ (2002) to determine the 
mass-to-light ratio of Draco. The data set comprises 159 
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Figur e 12. Spatia l distr ibution of Draco velocity measurements 
from lKlevna et al.l ||2002|) . 



line-of-sight stellar velocity measurements with their errors. 
The spatial distribution of the measurements is shown in 
Figure [12] Binning the data to create a second velocity mo- 
ment shows that the mome nt is rising with incre asing radius. 
Other key data taken from lKlevna et all (|2002h are the cen- 
tral velocity dispersion of 8.5 km s _1 , the effective radius of 
9.71 arcminutes (w 214 pc), and the central V-band surface 
brightness of 2.2 x 10 6 Lq kpc~ 2 . Following the analysis in 



IWilkinson et all (|2002f ) and elsewhere, the surface brightness 
data is taken to follow a spherical Plummer model. 

For an isotropic velocity dispersion with a rising line- 
of- sight velocity dispersio n curve, the key equations based 
on IWilkinson etafl (|2002h arc 



(i) Relative potential 
Mr) = ^ 

where a < 0. 

(ii) Distribution function 

f{E) oc \E\ 5/a ~ 3/2 . 

(iii) Circular velocity 
2 2 r 2 

Vdrc = V 



(l + r 2)i+«/a 



(56) 



(57) 



(58) 



where tpo = v%ja. Given that the matter distribution is 
spherical, v c i IC may also be expressed as 



2 GM(< r) 
v ■ — . 



(iv) Surface brightness 
Io 



I(R) = 



(l + R 2 f 



(59) 



(60) 



where R is the projected radius, and Io is the measured 
central surface brightness. 

(v) Line-of-sight second velocity moment 



a 2 (R) 



(1 + R?) 



,/2 



(61) 



where 00 is the measured central line-of-sight value. The 
relationship between ctq and vo is 
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Table 10. M2M determination of Draco potential power law 



Measure 


a 


Error bounds 


—F 


-1.18 






-1.24 


+0.12, -0.16 


-S 


-0.96 




x 2 


-0.95 


+0.04, -0.06 


xJ B 


-0.96 


+0.03, -0.04 


XyD 


-0.90 


+0.36, -0.35 



le+00 



Constraint 



CI 

b 



The value of the potential power law index a determined from 
different M2M model outputs. 



2 

(TO = 



30F«gr(2 + a/2) 
4(a + 5)r(5/2 + a/2)' 



(62) 



Spatial distance are in units of the effective radius of Draco, 
and velocities, in effective radii per 10 7 years. 

The role of the M2M method is to determine the value 
of a which best fits the data, and then we use the circular 
velocity to determine the mass it implies (via equations 1581 
and I59|l and thus the mass-to-light ratio. We vary a in the 
same manner that the mass-to-light T ratio was varied in 
section [5] 

6.2 Proof of approach 

Before using Kleyna's Draco data, we examine the ability of 
the M2M method to determine the Wilkinson a parameter. 
Surface brightness and line-of-sight second velocity moment 
data are created using the functions in section [6TT1 with 1% 
relative errors for surface brightness and 10% for the velocity 
moment. For this trial data set, a — —0.5. We run the M2M 
models with 2 x 10 5 particles and for 250 time units. The 
model parameters have values e = 2.5 x 10 -3 , fi = 1, A sum = 
10 3 and A iso = 7 x 10" 2 . A S b has the fixed value Asb = 10" 5 
while Avd is varied to accommodate the different maximum 
velocities in the models as a is varied (from a — —0.8 to a = 
—0.2). We find that the modelling runs give clear minima 
(close to the true a value) in the \ 2 values, with a — —0.51 
from xsb an d a = —0.49 from Xvd- That is, the approach 
works ! 
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Figure 13. Draco second velocity moment from binned velocity 
measurements. The top panel shows the second velocity moment 
data points and error terms, and the bottom panel, the number of 
velocity measurements per bin. The projected radius R is in units 
of the effective radius of Draco, and velocities are in effective radii 
per 10 7 years. 
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6.3 Modelling Draco 

Binning the Draco velocity measurements (159 in total), us- 
ing equal interval projected radius bins, to create a set of 
second velocity moment data points and error terms gives 
the plots in Figure [131 As can be seen, the measurements are 
centrally clustered and the velocity moment data points at 
higher radii suffer from a lack of contributing measurements. 
Comparing Figures [13] and 1141 the trial data has more data 
points than the Draco data (24 versus 7) and the spread of 
points reflects the generating function. The Draco data does 
not visibly reflect any underlying curve. 

After some experimentation, we run the M2M models 
for values of the Wilkinson a parameter in the range —1.5 
to —0.5 with the same model parameters as in the proof of 
concept exercise in section |Tx2l The results are captured in 
Table [101 Weight convergence is high (> 99%) in all runs 
except for a = —1.5 where it is slightly lower (96%). There 
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Figure 14. The trial second velocity moment data used in the 
proof of concept exercise in section 16.21 



Table 11. Model vs theory line-of-sight velocity distribution 



a 


% within 5% 


% within 10% 


-1.50 


66.0 


88.7 


-1.25 


84.3 


100.0 


-1.10 


89.3 


100.0 


-1.00 


93.7 


100.0 


-0.90 


99.4 


100.0 


-0.75 


88.1 


98.1 


-0.50 


81.8 


90.6 
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For different values of a, the percentage of the individual velocity 
measurements that have model line-of-sight velocity distribution 
probabilities within 5% and 10% of their theoretical probabilities. 

are no issues associated with the sum of weights constraint 
and the isotropic dispersion constraint. 

We repeat the runs and include the individual velocity 
measurements as constraints (see section 12.21 equation I21|) 
as well as the line-of-sight second moment. The individual 
measurements make very little difference to the results in 
Table 1101 Taking the model produced line-of sight velocity 
distribution and comparing it to the theoretical distribution 
at all of the individual velocity measurement points gives 
a measure of how well the M2M model has reproduced the 
theoretical distribution (Table [TTj) . As can be seen the high- 
est reproduction at the 5% level occurs at a = —0.9 which 
is also the value resulting for Xvo m Table [lOl 

Comparing how well the model line-of-sight second ve- 
locity moment matches the theoretical moment for a given 
value of a (Figure [T5|) shows that a good match is achieved 
for a — —0.9. For a = —0.5, the model overestimates the 
theoretical moment implying that the magnitude of a needs 
to be increased, and for a = —1.5 underestimates it implying 
that a reduction in the magnitude of a is required. 

Using a = — 0.90tn'q c, the mass within 3 core radii 
is (9.7 ± 2.3) x 10 7 M©. felevna et all l |2002h achieved a 
value of 8^2 x 10 7 Mq . Taking Draco's V-band lu minosity as 
(1.8±0.8)xl0 5 L (|lrwin fc Hatzidimitrioill995f). the mass- 
to- ligh t ratio for Draco is 539 ± 136 Mq/Lq. Klevna et al.l 
( 2002) obtained a lower value of 440 ± 240 M /L which is 
not surprising given they were using an anisotropic disper- 
sion model. 
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7 CONCLUSION 

Other authors have commented on the potential of Syer & 
Tremaine's made-to-measure-method. We hope in this pa- 
per that we have helped to expose some of that potential 
in a practical way. We believe we have added clarity to the 
construction of the weight evolution equation, shown how 
constraints may be incorporated and defined 2 further con- 
straints. We have demonstrated a simple application of M2M 
modelling by determining the mass-to-light ratio of a the- 
oretical Plummer model. With a variety of constraints and 
initial conditions, we arrive in all cases at a model value close 
to the true value. Using the method with an isotropic veloc- 
ity dispersion model, we estimate the mass-to-light ratio of 
Draco and achieve a V-band value of 539 ± 136 Mq/Lq. 
We encourage others to use the method - particularly, 



Figure 15. Comparison of the model produced line-of-sight sec- 
ond velocity moment (solid points) with the theoretical moment 
for a = —0.5 (top panel), a = —0.9 (middle panel), and a = —1.5 
(bottom panel). 

for comparison purposes, those who have experience of us- 
ing Schwarzschild's method. It should be noted that much 
of the preparatory work (whether theoretical, or manipu- 
lation of observational data, say) leading to the execution 
of a Schwarzschild or a M2M model is in fact common. 
The key difference in the methods is how the final parti- 
cle weights are determined. Also, the orbit selection stage 
in a Schwarzschild model is not required for a M2M model. 
The only shortfall against Schwarzschild's method we are 
aware of is the modelling of proper motion data. 
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We have given insight into how a made-to-measure 
model may be implemented and provided some information 
on its behaviour and how to size and tune such a model. We 
have defined a simple mechanism for assessing the degree 
of particle weight convergence and are not aware of such a 
mechanism being used elsewhere with M2M models. From 
a software perspective, our first unparallelised implementa- 
tion was less than 2000 lines of C code so the effort required 
to establish a working prototype is not huge. 

As this paper was being completed, iDehnenl (j2009l ) be- 
came available and there is some overlap with this paper 
notably in total weight conservation and the use of the j3 
anisotropy parameter as a constraint. 

Our next steps are to continue the move away from us- 
ing theoretical models and apply the method to observations 
of more, real galaxies, and to understand practically the rel- 
ative strengths of Schwarzschild's method and the made-to- 
measure method. 
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